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Abstract 

Given a sample from a probability measure with support on a submanifold in 
Euclidean space one can construct a neighborhood graph which can be seen as an 
approximation of the submanifold. The graph Laplacian of such a graph is used in 
several machine learning methods like semi-supervised learning, dimensionality reduc- 
tion and clustering. In this paper we determine the pointwise limit of three different 
graph Laplacians used in the literature as the sample size increases and the neighbor- 
hood size approaches zero. We show that for a uniform measure on the submanifold all 
graph Laplacians have the same limit up to constants. However in the case of a non- 
uniform measure on the submanifold only the so called random walk graph Laplacian 
converges to the weighted Laplace-Beltrami operator. 

1 Introduction 

In recent years, methods based on graph Laplacians have become increasingl y popular in 
machine learning. They have been used in semi-su pervised learning (.Belkin and Nivogi 



2004 



199e 



Zhou et aLl l2004l:lzhu and Ghahramanibood ). sp ectral clustering (ISpielman and Tend. 

von Luxburg . 2006 ) and dimensionahty reduction ( Belkin and Nivogi . 20031 : Coifman and Lafon 



20061 ). Their popularity is mainly due to the following properties of the Laplacian which 



wih be discussed in more detail in a later section: 

• the Laplacian is the generator of the diffusion process (label propagation in semi- 
supervised learning), 

• the eigenvectors of the Laplacian have special geometric properties (motivation for 
spectral clustering), 

• the Laplacian induces an adaptive regularization functional, which adapts to the 
density and the geometric structure of the data (semi-supervised learning, classifi- 
cation) . 
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If the data lies in the neighborhood graph built from the random sample can be seen 
as an approximation of the continuous structure, in particular, if the data has support on 
a low-dimensional submanifold the neighborhood graph is a discrete approximation of the 
submanifold. In machine learning we are interested in the intrinsic properties and objects 
of this submanifold. The approximation of the Laplace-Beltrami operator via the graph 
Laplacian is a very important one since it has numerous applications as we will discuss 
later. 

Approximations of the Laplace-Beltrami operator or related objects have been studied 
for certain special deterministic graphs. The easiest case is a grid in M'^. In numerics it is 
standard to approximate the Laplacian with finite-differences schemes on the grid. These 
can be seen as a special instances of a graph Laplacian. There convergence for decreasing 
grid-size follows easil y bv an argument using Taylor expansions. Another more involved 
example is the work of Varopoulos ( 19841 ). where for a graph generated by an e-packing of a 
manifold, the equivalence of certain properties of random walks on the graph and Brownian 
motion on the manifold have been established. The connection between random walks and 
the graph Laplacian becomes obvious by noting that the graph Laplacian as well as the 
Laplace-Beltrami operator are the ge nerators of the diffusion process on the graph and the 
manifold, respectively. In 2£u ( 20041 ) the convergence of a discrete approximation of the 
Laplace Beltrami operator for a triangulation of a 2D-surface in M.^ was shown. However, it 
is unclear whether the approximation described there can be written as a graph Laplacian 
and whether this result can be generalized to higher dimensions. 

In the case where the graph is generated randomly, only first results have been proved 
so far. The first work on the large sample limit of graph Laplacians has been done by 
Bousauet et~al] (jiooi)- There the authors studied the convergence of the regularization 



functional induced by the graph Laplacian using the law of large numbers for [/-statistics. 
In a second step taking the limit of the neighborhoodsize /i — > 0, they got ^V(p^V) as the 

effective limit operator in R'^. Their result has recently been generali zed to the subman ifold 
case and uniform c o nverg ence over the space of Holder- functions by iHein (l2005l . l2006l ^. In 



von Luxburg et al.1 (|2007l ). the neighborhoodsize h was kept fixed while the large sample 



limit of the graph Laplacian was considered. In this setting, the authors showed strong 
convergence results of graph Laplacians to certain integral operators, which imply the 
convergence of the eigenvalues and eigenfunctions. Thereby showing the consistency of 
spectral clustering for a fixed neighborhood size. 

In contrast to the previous work in this paper we will consider the large sample limit 
and the limit as the neighborhood size approaches zero simultaneously for a certain class 
of neighbhorhood graphs. The main emphasis lies on the case where the data generat- 
ing measure has support on a submanifold of M*^. The bias term, that is the difference 
between the continuous counterpart of the graph Laplacian and the Laplacian itself ha s 
bee n studie d first for compact submanifolds without boundary by lSmolyanov et al. (|2000l ^ 
and iBelkinI tooi ) for t he Gau ssian kernel and a uniform data generating measure and 
was then generalized by LafonI ((2004) to general isotropic weights and general probabil- 
ity measures. Additionally Lafon showed that the use of data-dependent weights for the 
graph allows to control the influence of the density. They all show that the bias term 
converges pointwise if the neighborhood size goes to zero. The convergence of the graph 
Laplacian towards the s e cont inuo us averaging operators wa s left open. This part wa.s firs t 
studied bv iHein et aP (|2005l ) and iBelkin and Nivogil (|2005l '). In lSelkin and Nivogil (j2005l l 
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the convergence was shown for the so called unnormalized graph Laplacian in the case of 
a uniform probability measure on a compa ct manifold witho ut boundary and using the 
Gaussian kernel for the weights, whereas in iHein et alJ (|2005l l the pointwise convergence 
was shown for the random walk graph Laplacian in the case of general probability measures 
on non-compa ct manifolds with boundary u sing general isotropic data-dependent weights. 
More recently Gine and Koltchinskii ( 20061 ) have extended the pointw ise convergence for 
the unnormalized graph Laplacian shown by iBelkin and Niyogil (120051 ) to u niform conver- 
gence o n compact submanifolds with out boundary giving explicit rates. In ISingeii (120061) . 
see also Gine and Koltchinskii ( 20061 ) . the rate of convergence given bv lHein et all (|200,5l ^ 
has been improved in the setting of the uniform measure. In this paper we will study the 
three most often used graph Laplacians in the machin e learn ing l iterature and show their 
pointwise convergence in the general setting of Lafon ( 20041 ) and Hein et al. ( 20051 ). that 
is we will in particular consider the case where by using data-dependent weights for the 
graph we can control the influence of the density on the limit operator. 

In Section [2] we introduce the basic framework necessary to define graph Laplacians 
for general directed weighted graphs and then simplify the general case to undirected 
graphs, in particular, we define the three graph Laplacians used in machine learning 
so far, which we call the normalized, the unnormalized and the random walk Laplacian. 
In Section [3] we introduce the neighborhood graphs studied in this paper, followed by an 
introduction to the so called weighted Laplace-Beltrami operator, which will turn out to be 
the limit operator in general. We also study properties of this limit operator and provide 
insights why and how this operator can be used for semi-supervised learning, clustering 
and regression. Then finally we present the main convergence result for all three graph 
Laplacians and give the conditions on the neighborhood size as a function of the sample 
size necessary for convergence. In Section H] we illustrate the main result by studying the 
difference between the three graph Laplacians and the effects of different data-dependent 
weights on the limit operator. In Section [5] we prove the main result. We introduce a 
framework for studying non-compact manifolds with boundary and provide the necessary 
assumptions on the submanifold M, the data generating measure P and the kernel k used 
for defining the weights of the edges. We would like to note that the theorems given in 
Section [5] contain slightly stronger results than the ones presented in Section [3l The reader 
who is not familiar with differential geometry will find a brief introduction to the basic 
material used in this paper in Appendix lAl 



2 Abstract Definition of the Graph Structure 



In this section we define the structure on a graph which is required in order to define 
the graph Laplacian. To this end one has to introduce Hilbert spaces Hy and He of 
functions on the vertices V and edges E, define a difference operator d, and then set 
the graph Laplacian as A = d*d. We first do this in full generality for directed graphs 
and then specialize it to undirected graphs. This approach is well-known for u ndirected 
graph s in discrete po t ential theory a nd spectral graph theo r y, see for example iDodziuk 
JSi); IChund (jlQQTj li lWoessI (l20r)0l') : iMcDonald and MeversI |20o3), and was generahzed 
to directed graphs by Zhou et al.l ( 2005 ) for a special choice of Hv,He and d. To our 
knowledge the very general setting introduced here has not been discussed elsewhere. 
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In many articles graph Laplacians are used without explicitly mentioning d, Hy and He- 
This can be misleading since, as wc will show, there always exists a whole family of choices 
for d, Hy and He which all yield the same graph Laplacian. 



2.1 Hilbert Spaces of Functions on the Vertices V and the Edges E 

Let (y, W) be a graph where V denotes the set of vertices with |y| = n and W a positive 
n X n similarity matrix, that is Wij > 0, i,j = l,...,n. W need not to be symmetric, that 
means we consider the case of a directed graph. The special case of an undirected graph 
will be discussed in a following section. Let E C V xV he the set of edges eij = with 
Wij > 0. eij is said to be a directed edge from the vertex i to the vertex j with weight Wij . 
Moreover, we define the outgoing and ingoing sum of weights of a vertex i as 

We assume that + df^ > 0, i = l,...,n, meaning that each vertex has at least one 
in- or outgoing edge. Let = {a; G M | a; > 0} and = {x G M | a; > 0}. The inner 
product on the function space is defined as 



1 " 

{f,9)y = -Yl fidiXi, 



n 

i=l 

where Xi = {Xout{dr^) + Xin{df)) with Xout : M+ ^ M+ and Xin : ^ R+, Xout{0) = 

Xm(0) = and further Xout and Xin strictly positive on 

We also define an inner product on the space of functions on the edges: 

1 " 

where (f) : M_|_ M+, 0(0) = and cp strictly positive on M^. Note that with these 
assumptions on ^ the sum is taken only over the set of edges E. One can check that 
both inner products are well-defined. We denote by H(y, x) = (^v, ■)y) and Tl{E, (f)) = 
(M^, (•, •)^) the corresponding Hilbert spaces. As a last remark let us clarify the roles of 
MX and M^. The first one is the space of functions on the vertices and therefore can be 
regarded as a normal function space. However, elements of can be interpreted as a 
flow on the edges so that the function value on an edge Cij corresponds to the "mass" 
flowing from one vertex i to the vertex j (per unit time). 

2.2 The Difference Operator d and its Adjoint d* 

Definition 1 The difference operator d : 'H(F, %) H{E, 0) is defined as follows: 

V eij G E, {df){eij) = j{wij) {f{j) - /(z)), (1) 

where 7 : Rl ^ Rl. 
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Remark: N ote that d i s zero on the constant functions as one would expect it from a 
derivative. In I Zhou et alj (|2004l ) another difference operator d is used: 



{df){eij) = l{wi^ 



m 



(2) 



Note that in lzhou et al. ( 20041 ) they have ^{wij) = 1. This difference operator is in general 



not zero on the constant functions. This in turn leads to the effect that the associated 
Laplacian is also not zero on the constant functions. For general graphs without any 
geometric interpretation this is just a matter of choice. However, the choice of d matters 
if one wants a consistent continuum limit of the graph. One cannot expect convergence 
of the graph Laplacian associated to the difference operator d of Equation ([2]) towards 
a Laplacian, since as each of the graph Laplacians in the sequence is not zero on the 
constant functions, also the limit operator will share this property unless lim^j^oo 
c,yi = 1, . . . ,n, where c is a constant. We derive also the limit operator of the graph 
Laplacian induced by the difference operator of Equation ([2]) introduced by Zhou et al. in 
the machine learning li terature and usually denoted as the normalized graph Laplacian in 
spectral graph theory ( Chung . 19971 ). Obviously, in the finite case d is always a bounded 
operator. The adjoint operator d* : TC{E, cj)) HiV, x) is defined by 



{df,u)j, = {f,d*u)y, yfeH{v,x), uen{E, 



Lemma 2 The adjoint d* : Ti.{E,<j)) — > TC{V,x) of the difference operator d is explicitly 
given by: 

1 /I " 1 " \ 

id*u){l) = — -'y''y{wu)uii(p{wii) 'y''y{wii)uiiHwH) ]■ (3) 

^Xl\nf^^ nf^ ) 

Proof: Using the indicator function f{]) = 1j=i it is straightforward to derive: 

1 1 " 

-Xi {d*u){l) = {dl.=u u)e = X] {'y('^ii)'^ii^('^ii) ~ l{wii)uii(i){wii)^ , 

where we have used {dl.=i, u) ^ = ^ J2i,j=iid''^-=i)ijUij(l>{wij). □ 

The first term of the rhs of ([3]) can be interpreted as the outgoing flow, whereas the second 
term can be seen as the ingoing flow. The corresponding continuous counterpart of d is 
the gradient of a function and for d* it is the divergence of a vector-field, measuring the 
infinitesimal difference between in- and outgoing fiow. 

2.3 The General Graph Laplacian 

Definition 3 (graph Laplacian for a directed graph) Given Hilbert spaces 7i{V,x) 
and 7i{E,(j)) and the difference operator d : 7i{V,x) T~(-{E,(j)) the graph Laplacian 
A : Ti.{V,x) ^ '^(Y-, x) defined as 

A = d*d. 



5 



Lemma 4 Explicitly, A : 7^(V,x) — 'HiVjx) is given as: 



(A/)(/) = -L 



1 " 



(4) 



Proof: The explicit expression A can be easily derived by plugging the expression of d* 
and d together: 

^Xi " 



n 

i=l 



^ /fc -.72 

=2^[/(')-E'2«-;iE/w» 

^' i=l i=l 

where we have introduced Wij = {^{wnf' 4>{wii) + ^{wuY ■ □ 



Proposition 5 A is self-adjoint and positive semi-definite. 

Proof: By definition, (/, A^)^ = {df, dg) ^ = {Af,g)y, and (/, Af)y = {df, df)^ > 0. □ 
2.4 The Special Case of an Undirected Graph 

In the case of an undirected graph we have Wij = Wji, that is whenever there is an edge 
from i to j there is an edge with the same value from j to i. This implies that there is no 
difference between in- and outgoing edges. Therefore, d?"* = d*", so that wc will denote the 
degree function by d with di = ^ X^i=i ''^ij- "^^^ same for the weights in Hy, Xout = Xin, 
so that we have only one function If one likes to interpret functions on E as flows, it is 
reasonable to restrict the space J-iE to antisymmetric functions since symmetric functions 
are associated to flows which transport the same mass from vertex i to vertex j and back. 
Therefore, as a net effect, no mass is transported at all so that from a physical point of 
view these functions cannot be observed at all. Since anyway we consider only functions 
on the edges of the form df (where / is in Tiv) which are by construction antisymmetric, 
we will not do this restriction explicitly. The adjoint d* simplifies in the undirected case 
to 

1 1 " 

{d*u){l) = , . -^liwilHiwiOiuii-uii), 
^XkP-i) 

and the general graph Laplacian on an undirected graph has the following form: 

Definition 6 (graph Laplacian for an undirected graph) Given Hilbert spaces 
7i{V,x) o,nd 7i{E,(p) and the difference operator d : Ti.iy,x) ~^ 'H{E,(f)) the graph Lapla- 
cian A : H^Vjx) — 'HiVjx) is defined as 

A = d*d. 



Explicitly, for any vertex I, we have 

1 



(A/)(o = id*dfm 



Xidi) 



f{l)-Y.l\wu)<PM --Y.f{i}^\wu)Hm] 



n ^ — ' n 

i=l 1=1 



(5) 
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In the literature one finds the following special cases of the general graph Laplacian. Un- 
fortunately there exist no unique names for the three graph Laplacians we introduce here, 
most of the time all of them are just called graph Laplacians. Only the term 'unnormal- 
ized' or 'combinatorial' graph Laplacian seems to be established now. However, the other 
two could both be called normalized graph Laplacian. Since the first one is closely related 
to a random walk on the graph we call it random walk graph Laplacian and the other one 
normalized graph Laplacian. 
The 'random walk' graph Laplacian is defined as: 



(A(™)/)(i) = /(z) 



1 1 

di n 



(6) 



where the matrix D is defined as Dj 



di 6ij. Note that P = D-'^W is a stochastic matrix 



and th erefore can be used to define a Markov random walk on V, see for example IWoess 
(l200d ). The 'unnormalized' (or 'combinatorial') graph Laplacian is defined as 



(A(-)/)(0 = djii) - i ^ w,,fij), AH/ = iD- W)f. 



(7) 



We have the following conditions on X; 7 and (j) in order to get these Laplacians: 



V Cij G E : 



rw : 



xidi 



di 



unnorm : 



xidi) 



We observe that by choosing A^™^ or A^*^) the functions (f> and 7 are not fixed. Therefore it 
can cause confusion if one speaks of the 'random walk' or 'unnormalized' graph Laplacian 
without explicitly defining the corresponding Hilbert spaces and the differ ence op e rator . 
We also consider the normalized graph Laplacian A^"^) introduced by Chung ( 19971 ): 
Zhou et al] ( 20041 ) using the difference operator of Equation ([2|) and the general spaces 
HiV,x) and 7i{E,(j)). Following the scheme a straightforward calculation shows the fol- 
lowing: 



Lemma 7 The graph Laplacian Aj- 
([2]) can he explicitly written as 



d*d with the difference operator d from Equation 



iA(^)fm 



1 



nxidi) Vdi 



Vdinf^^ n^^Vd;. 



T he choice yidi) = 1 an d j'^jw i AMwu) = w g leads then to the graph Laplacian proposed 



m 



Chung and Lanalandt (lOoA ): Zhou et oZI l200ji) . 



iA(^)f)il) 
or equivalently 



n y/di 



fil) 



di 



n 



E 

i=l 



fii). 



--wii 



fil) 



n 



Wil 

Vdi di 



A^"^/ = D'^D - W)D-^f = (1 - D-^WD~^)f. 
Note that A^") = DA^"^") and A^"^) = L>-i A(^^)Z)~3 . 
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3 Limit of the Graph Laplacian for Random Neighborhood 
Graphs 



Before we state the convergence results for the three graph Laplacians on random neigh- 
borhood graphs, we first have to define the hmit operator. Maybe not surprisingly, in gen- 
eral the Laplace-Beltrami operator will not be the limit operator of the graph Laplacian. 
Instead it will converge to the weighted Laplace-Beltrami operator which is the natural 
generalization of the Laplace-Beltrami operator for a Riemannian manifold equipped with 
a non-uniform probability measure. The definition of this limit operator and a discussion 
of its use for different applications in clustering, semi-supervised learning and regression 
is the topic of the next section, followed by a sketch of the convergence results. 



3.1 Construction of the Neighborhood Graph 

We assume to have a sample Xi, i = 1, . . . ,n drawn i.i.d. from a probability measure P 
which has support on a submanifold M. For the exact assumptions regarding P, M and 
the kernel function k used to define the weights we refer to Section 15.21 The sample then 
determines the set of vertices V of the graph. Additionally we are given a certain kernel 



functi o n k : — > and t he ne ighborhood parameter h G Ml. As proposed by iLafon 



( 2OO4I ) : Coifman and Lafon ( 20061 ) . we use this kernel function k to define the following 



family of data-dependent kernel functions kx^h parameterized by A € M as: 



^ 1 k{\\X,-Xjf /h^) 



where dh^ni^i) = ^ Yl7=i j^^dlXi — Xj\\'^ //i^) is the degree function introduced in Sec- 
tion [2] with respect to the edge-weig hts j^k{\\Xi - Xjf //i^). Finally we use kx h to define 
the weight Wij = w{Xi,Xj) of the edge between the points Xi and Xj as 

wx,h{Xi,Xj) = kx,h{Xi,Xj). 

Note that the case A = corresponds to weights with no data-dependent modification. 
The parameter h G R!!^ determines the neighborhood of a point since we will assume that 
k has compact support, that is Xi and Xj have an edge if \\Xi — Xj\\ < hRk where Rk is 
the support of kernel function. Note that we will have A;(0) = 0, so that there are no loops 
in the graph. This assumption is not necessary, but it simplifies the proofs and makes 
some of the estimators unbiased. 

In Section [27il we introduced the random walk, the unnormalized and the normalized graph 
Laplacian. From now on we consider these graph Laplacians for the random neighborhood 
graph, that is the weights of the graph Wij have the form Wij = w{Xi,Xj) = kx^h{Xi.,Xj). 
Using the kernel function we can easily extend the graph Laplacians to the whole sub- 
manifold M. These extensions can be seen as estimators for the Laplacian on M. We 
introduce also the extended degree function dx,h,n and the average operator Ax,h,n-, 

n \ ^ 

d\,h,n{x) = - ^kx,h{x,Xj), {Ax,h,nf){x) = - ^kx,h{^,Xj)f{Xj). 

^ 3=1 ^ 3=1 
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Note that dx^h,n = ^A,/i,nl- The extended graph Laplacians are defined as follows: 

random walk {A'^J]J){x) = ^{f - J^JxAnf) (^) = ^ ( ) (^)' (») 



unnormalized {A^^J^^^J){x) = ^ [dx^nf - Axxnf^ {x) = ^{Ax,h,ng){x), (9) 

normalized (A^IMx) = ^^,^7^ (^^^'^-Tfc " (^^''^-Tfc)) 

= // , (Ax,h,n9'){x), (10) 

where we have introduced g{y) := f{x) — f{y) and g'{y) := —1=^= , /^^^ Note 

that all extensions reproduce the graph Laplacian on the sample: 

(A/)(i) = (A/)(X,) = (AA,ft,,„/)(Xi), = 1, . . . , n. 

The factor l/Zi^ arises by introducing a factor 1/h in the weight 7 of the derivative operator 
d of the graph. This is necessary since d is supposed to approximate a derivative. Since 
the Laplacian corresponds to a second derivative we get from the definition of the graph 
Laplacian a factor 1/h?. 

We would like to note that in the case of the random walk and and the normalized graph 
Laplacian the normalization with l/Zi™" in the weights cancels out, whereas it does not 
cancel for the unnormalized graph Laplacian except in the case A = 1/2. The problem 
here is that in general the intrinsic dimension m of the manifold is unknown. Therefore a 
normalization with the correct factor -j^ is not possible, and in the limit h ^ Q the estimate 
of the unnormalized graph Laplacian will generally either vanish or blow up. The easy way 
to circumvent this is just to rescale the whole estimate such that ^ dx,h,n{Xi) equals 
a fixed constant for every n. The disadvantage is that this method of rescaling introduces 
a global factor in the limit. A more elegant way might be to simultaneously estimate the 
dimension m of the submanifold and use the estimated dimension to calculate the correct 



normalization factor, see e.g. iHein and AudibertI (120051 ). However, in this work we assume 



for simplicity that for the unnormalized graph Laplacian the intrinsic dimension m of the 
submanifold is known. It might be interesting to consider both estimates simultaneously, 
but we leave this as an open problem. 

We will consider in the following the limit /i — > 0, that is the neighborhood of each point Xi 
shrinks to zero. However, since n — > 00 and h as a function of n approaches zero sufficiently 
slow, the number of points in each neighborhood approaches 00, so that roughly spoken 
sums approximate the corresponding integrals. This is the basic principle behind our 
convergence result and is well known in the framework of nonparametric regression (see 
Gv5rfi et al.l . I2OO4I ). 



3.2 The Weighted Laplacian and the Continuous Smoothness Functional 

The Laplacian is one of the most prominen t operators in ma them atics. The foll owing gen- 
eral properties are taken from the books of Rosenberg ( 199?! ) and Berard ( 19861 ). It occurs 



in many partial differential equations governing physics, mainly because in Euclidean space 
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it is the only linear second-order differential operator which is translation and rotation in- 
variant. In Euclidean space it is defined as A^df = div(grad/) = Yli=i f- Moreover, 
for any domain Q C M"^ it is a negative-semidefinite symmetric operator on C^(r2), which 
is a dense subset of L2{^) (formally self-adjoint), and satisfies 

[ fAhdx = - [ {Vf,Vh)dx. 
Jn Jn 

It can be extended to a self-adjoint operator on L2{0,) in several ways depending on 
the choice of boundary conditions. For any compact domain $7 (with suitable boundary 
conditions) it can be shown that A has a pure point spectrum an d the eigenfun ctions are 
smooth and form a complete orthonormal basis of L2(0), see e.g. Berard ( 19861 ). 



The Laplace-Beltrami operator on a Riemannian manifold M is the natural equivalent of 
the Laplacian in W^, defined as 

AM/ = div(grad/) = V^V,/. 

However, the more natural definition is the following. For any f,g^ C^{M), we have 

/ fAhdV{x) = - f {Vf,Vh)dV{x), 

JM JM 

where dV = \/detg dx is the natural volume element of M. This definition allows easily an 
extension to the case where we have a Riemannian manifold M with a measure P . In this 
paper P will be the probability measure generating the data. We assume in the following 
that P is absolutely continuous wrt the natural volume element dV of the manifold. Its 
density is denoted by p. Note that the case when the probability measure is absolutely 

continuous wrt the Lebesgue measure on M*^ is a special case of our set ting. 

A recent review article about the weighted Laplace-Beltrami operator is (jGrigoryanl . 



Definition 8 (Weighted Laplacian) Let {M,gab) be a Riemannian manifold with mea- 
sure P where P has a dijjerentiable and positive density p with respect to the natural volume 
element dV = y/detgdx, and let Am be the Laplace-Beltrami operator on M. For s G M, 
we define the s-th weighted Laplacian Ag as 

As := Am + -g'''{^aP)Vb = ^g'^'^aip'Vb) = 4 div(p^ grad). (11) 
p p^ 

This definition is motivated by the following equality, for f,g £ C^{M), 

[ f{Asg)fdV=l f{Ag + -{Vp,Vg))fdV = - [ {Vf,Vg)fdV, (12) 

JM JM P JM 

The family of weighted Laplacians contains two cases which are particularly interesting. 
The first one, s = 0, corresponds to the standard Laplace-Beltrami operator. This case 
is interesting if one only wants to use properties of the geometry of the manifold but not 
of the data generating probability measure. The second case, s = 1, corresponds to the 
standard weighted Laplacian Ai = |V"(pV(i). 

In the next section it will turn out that through a data-dependent change of the weights 
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of the graph we can get the just defined weighted Laplacians as the limit operators of the 
graph Laplacian. The rest of this section wih be used to motivate the importance of the 
understanding of this limit in different applications. Three different but closely related 
properties of the Laplacian are used in machine learning 

• The Laplacian generates the diffusion process. In semi-supervised learning algo- 
rithms with a small number of lab eled points one would l i ke to pr opagate th e label s 



(|2003l l. 



along regions of high-density, see IZhu and Ghahramanil (j2002l ): IZhu et al. 
The limit operator shows the influence of a non-uniform density p. The second 
term ^ (Vp, V/) leads to an anisotropy in the diffusion. If s < this term enforces 
diffusion in the direction of the maximum of the density whereas diffusion in the 
direction away from the maximum of the density is weakened. If s < this is just 
the other way round. 

The smoothness functional induced by the weighted Laplacian A^, see Equation [T2t 
is given by 



S{f) 



WVffp^dv. 



M 



For s > this smoothness functional prefers functions which are smooth in high- 
density regions whereas unsmooth behavior in low-density is penalized less. This 
property can also be interesting in semi-supervised learning where one assumes es- 
pecially when only a few labeled points are known that the classifier should be 
constant in high-den sity regions whe r eas ch anges of the classifier are allowed in low- 
densi t y regions, see iBousquet et~aD ^2Q04 ) for some discussion of this point and 
Hein ( 2005 . 20061 ) for a proof of convergence of the regularizer induced by the graph 



Laplacian towards the smoothness functional S{f). In Figure [T] this is illustrated by 
mapping a density profile in onto a two-dimensional manifold. However, also the 




I 




Figure 1: A density profile mapped onto a two-dim. submanifold in with two clusters. 



case s < can be interesting. Minimizing the smoothness functional S{f) implies 
that one enforces smoothness of the function / where one has little data, and one 
allows the function to vary more wher e one has sampled a lot o f data points. Such 
a penalization has been considered by ICanu and Elisseefl for regression. 
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• The eigenfunctions of the Laplacian Ag can be seen as the hmit partioning of spec- 
tral clustering for the normalized graph Laplacian (however , a rigorous mathematical 



proof has not been given yet, see Ivon Luxburg et alj (j2007l ) for a convergence result 



for fixed h). If s = one gets just a geometric clustering in the sense that irrespec- 
tively of the probability measure generating the data the clustering is determined 
by the geometry of the submanifold. If s > the eigenfunction corresponding to 
the first non-zero eigenvalue is likely to change its sign in a low-density region. This 
argument follows from the previous discussion on the smoothness functional S{f) 
and the Rayleigh-Ritz principle. Let us assume for a moment that M is compact 
without boundary and that p{x) > 0,Vx S M, then the eigenspace corresponding 
to the first eigenvalue Aq = is given by the constant functions. The first non-zero 
eigenvalue can then be determined by the Rayleigh-Ritz variational principle 



Ai = inf 



fj^.j \\Vu\\p{xydV{x) 



u{x) p{xy dV (x) = 

AI 



MeC°°(Af) \ fj^ju'^{x)p{xydV{x) 

Since the first eigenfunction has to be orthogonal to the constant functions, it has 
to change its sign. However, since || V?x||^ is weighted by a power of the density p^ it 
is obvious that for s > the function will change its sign in a region of low density. 



3.3 Limit of the Graph Laplacians 

The following theorem summarizes and slightly weakens the results of Theorem [27] and 
Theorem [28] of Section [5] 

Main Result Let M be a m- dimensional submanifold in M.'^, {Xi}^^^ a sample from a 
probability measure P on M with density p. Let x S M\dM and define s = 2(1 — A). 
Then under technical conditions on the submanifold M , the kernel k and the density p 
introduced in Section\^ if h ^ and n/i™+^/ log n oo, 

random walk: lim (AIT^ /)(x) ~ —(Asf)(x) almost surely, 

unnormalized: lim (A^"^^/)(x) ~ —p{xy~'^'^ {Asf){x) almost surely. 



The optimal rate is obtained for h{n) = 0^(logn/n) '"+4 j . If /i ^ and n/i^^^/logn — > 
oo, 

normalized: lim (A^"^]^ ,^/)(x) ~ —p{x)'^~^ As(^^--^—^{x) almost surely. 

where ~ means that there exists a constant only depending on the kernel k and A such that 
equality holds. 

The first observation is that the conjecture that the graph Laplacian approximates the 
Laplace-Beltrami operator is only true for the uniform measure, where p is constant. 
In this case all limits agree up to constants. However, big differences arise when one 
has a non-uniform measure on the submanifold, which is the generic case in machine 
learning applications. In this case all limits disagree and only the random walk graph 
Laplacian converges towards the weighted Laplace-Beltrami operator which is the natural 
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generalization of the Laplace-Beltrami operator when the manifold is equipped with a 
non-uniform probability measure. The unnormalized graph Laplacian has the additional 
factor p^~^^. However, this limit is actually quite useful, when one thinks of applications 
of so called label propagation algorithms in semi-supervised learning. If one uses this 
graph Laplacian as the diffusion operator to propagate the labeled data, it means that the 
diffusion for A < 1/2 is faster in regions where the density is high. The consequence is that 
labels in regions of high density are propagated faster than labels in low-density regions. 
This makes sense since under the cluster assumption labels in regions of low density are 
less informative than labels in regions of high density. In general, from the viewpoint of a 
diffusion process the weighted Laplace-Beltrami operator = Am + ^V^V can be seen as 
inducing an anisotropic diffusion due to the extra term ^VpV, which is directed towards or 
away from increasing density depending on s. This is a desired property in semi-supervised 
learning, where one actually wants that the diffusion is mainly along regions of the same 
density level in order to fulfill the cluster assumption. 

The second observation is that the data-dependent modification of ed ge weights allows to 



control the influence of the density on the limit operator as observed bv lCoifman and Lafon 



[n fact one can even eliminate it for s = resp. A = 1 in the case of the random 



walk graph Laplacian. This could be interesting in computer graphics w here the random 
walk graph Laplacian is used for mesh and point cloud processing, see e.g. ISorkinel (|2006l l. 



If one has gathered points of a curved object with a laser scanner it is likely that the 
points have a non- uniform distribution on the object. Its surface is a two-dimensional 
submanifold in R^. In computer graphics the non- uniform measure is only an artefact 
of the sampling procedure and one is only interested in the Laplace-Beltrami operator to 
infer geometric properties. Therefore the elimination of the influence of a non-uniform 
measure on the submanifold is of high interest there. We note that up to a multiplication 
with the inverse of the density the elimination of density effects is also possible for the 
unnormalized graph Laplacian, but not for the normalized graph Laplacian. All previous 
observations are naturally also true if the data does not lie on a submanifold but has 
d-dimensional support in M'^. 

The interpretation of the limit of the normalized graph Laplacian is more involved. An 
expansion of the limit operator shows the complex dependency on the density p: 



p-2-^As ^ = Am/ + -VpV/ - (A - -r WVpf + (A - -)^Amp 
\p2~^J P 2 p 2 p 

We leave it to the reader to think of possible applications of this Laplacian. 
The discussion shows that the choice of the graph Laplacian depends on what kind of 
problem one wants to solve. Therefore, in our opinion there is no universal best choice 
between the random walk and the unnormalized graph Laplacian from a machine learning 
point of view. However, from a mathematical point of view only the random walk graph 
Laplacian has the correct (pointwise) limit to the weighted Laplace-Beltrami operator. 

4 Illustration of the Results 

In this section we want to illustrate the differences between the three graph Laplacians 
and the control of the influence of the data-generating measure via the parameter A. 
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Figure 2: For the uniform distribution all graph Laplacians, A^'^^^, Aj^"^ ^ and L n (f^'O^i 
left to right) agree up to constants for all A. In the figure the estimates of the Laplacian 
are shown for the uniform measure on [—3,3]^ and the function f{x) = sin(^ ll^^ll^)/ 
with 2500 samples and h = lA. 



4.1 Flat Space 

In the first example the data lies in Euclidean space M^. Here we want to show two things. 
First, the sketch of the main result shows that if the data generating measure is uniform 
all graph Laplacians converge for all values of the reweighting parameter A up to constants 
to the Laplace-Beltrami operator, which is in the case of just the standard Laplacian. 
In Figure [2] the estimates of the three graph Laplacians are shown for the uniform measure 
[—3,3]'^ and A = 0. It can be seen that up to a scaling all estimates agree very well. In a 
second example we study the effect of a non-uniform data-generating measure. In general 
all estimates disagree in this case. We illustrate this effect in the case of with a Gaussian 
distribution M{0, 1) as data-generating measure and the simple function f{x) = - — 4. 
Note that A/ = so that for the random walk and the unnormalized graph Laplacian 
only the anisotropic part of the limit operator, ^VpV/ is non-zero. Explicitly the limits 



are given as 

A(=^-)~ A,/ = A/ + -VpV/ = -sV.: 

A(-)~ p^-2^A./ = -.e-^W^5].x., 

^ ^7. 



A^"^) - P^-'A.J-^ = - E, - ( E, - 4) [(A -hd- A) llxf - 2(A - 



This shows that even applied to simple functions there can be large differences between 
the different limit operators provided the samples come from a non-uniform probability 
measure. Note that like in nonparametric kernel regression the estimate is quite bad at 
the boundary. This well known boundary effect arises since at the boundary one does not 
average over a full ball but only over some part of a ball. Thus the first derivative V/ of 
order 0{h) does not cancel out so that multiplied with the factor we have a term of 
order 0{l/h) which blows up. Roughly spoken this effect takes pla ce at all points of order 
0{h) away from the boundary, see also ( Coifman and Lafon . 20061 ). 
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4.2 The Sphere 

In our next example we consider the case where the data hes on a sub manifold M in W^. 
Here we want to illustrate in the case of a sphere 5^ in the control of the influence 
of the density via the parameter A. In this case we sample from the probability measure 
with density p((/>, 9) = + cos^(6') in spherical coordinates with respect to the volume 
element dV = sm{6)d6d(j). This density has a two-cluster structure on the sphere, where 
the northern and southern hemisphere represent one cluster. An estimate of the density 
p is shown in the Figure [H We show the results of the random walk graph Laplacian 
together with the result of the weighted Laplace-Beltrami operator and an error plot for 
A = 0, 1, 2 resulting in s = —2, 0, 2 for the function f{4>, 6) = cos{9). First one can see that 
for a non-uniform probability measure the results for different values of A differ quite a 
lot. Note that the function / is adapted to the cluster structure in the sense that it does 
not change much in each cluster but changes very much in region of low density. In the 
case of s = 2 we can see that Agf would lead to a diffusion which would lead roughly to 
a kind of step function which changes at the equator. The same is true for s = but the 
effect is much smaller than for s = 2. In the case of s = — 2 we have a completely different 
behavior. Agf has now flipped its sign near to the equator so that the induced diffusion 
process would try to smooth the function in the low density region. 



5 Proof of the Main Result 

In this section we will present the main results which were sketched in Section [3.31 together 
with the proofs. In Section [5 .11 we first introduce some non-standard tools from differential 
geometry which we will use later on. in particular, it turns out that the so called manifolds 
with boundary of bounded geometry are the natural framework where one can still deal 
with non-compact manifolds in a setting comparable to the compact case. After a proper 
statement of the assumptions under which we prove the convergence results of the graph 
Laplacian and a preliminary result about convolutions on submanifolds which is of interest 
on its own, we then start with the final proofs. The proof is basically divided into two 
parts, the bias and the variance, where these terms are only approximately valid. The 
reader not familiar with differential geometry is encouraged to first read the appendix on 
basics of differential geometry in order to be equipped with the necessary background. 



5.1 Non-compact Submanifolds in M'^ with Boundary 

We prove the pointwise convergence for non-compact submanifolds. Therefore we have to 
restrict the class of submanifolds since manifolds with unbounded curvature do not allow 
reasonable function spaces. 

Remark: In the rest of this paper we use the Einstein summation convention that is 
over indices occurring twice has to be summed. Note that the definition of the curvature 
tensor differs betwe en t e xtboo ks. We use here the conventions regarding the definitions 
of curvature etc. of Lee ( 19971 ). 
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True Laplacian off Estimaled Laplacian off Efriar 




Figure 3: Illustration of the differences of the three graph Laplacians, random walk, unnor- 
malized and normalized (from the top) for A = 0. The function / is / = Yli=i ^ ^ ^'^d 
the 2500 samples come from a standard Gaussian distribution on M^. The neighborhood 
size h is set to 1.2. 
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Figure 4: Illustration of the effect of A = 0,1,2 (row 2 — 4) resulting in s = —2,0,2 
for the sphere with a non-uniform data-generating probability measure and the function 
f{9, 4>) = cos{6) (row 1) for the random walk Laplacian with n = 2500 and h = 0.6 
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5.1.1 Manifolds with Boundary of Bounded Geometry 

We will consider in general non-compact submanifolds with boundary. In textbooks on 
Riemannian geometry one usually only finds material for the case where the manifold has 
no boundary. Also the analysis e.g. definition of Sobolev spaces on non-compact Rieman- 
nian manifolds seems to b e non-s t andard. W e profit here very much from the thesis and 
an accompanying paper of Schick ( 19961 . 2001 ) which introduces manifolds with boundary 



of bounded geometry. All material of this section is taken from these articles. Naturally 
this plus of generality leads also to a slightly larger technical overload. Nevertheless we 
think that it is worth this effort since the class of manifolds with boundary of bounded 
geometry includes almost any kind of submanifold one could have in mind. Moreover, to 
our knowledge, it is the most general setting where one can still introduce a notion of 
Sobolev spaces with the usual properties. 

Note that the boundary dM is an isometric submanifold of M of dimension m — 1. There- 
fore it has a second fundamental form 11 which should not be mixed up with the second 
fundamental form IT of M which is with respect to the ambient space W^. We denote by 
V the connection and by R the curvature of dM. Moreover, let u be the normal inward 
vector field at dM. 

Definition 9 (Manifold with boundary of bounded geometry) Let M he a mani- 
fold with boundary dM (possibly empty). It is of bounded geometry if the following holds: 

• (N) Normal Collar: there exists rc > so that the normal geodesic flow 

K : {x,t) ex.p^{ti^x) 

is defined on dM x [0, rc) and is a diffeomorphism onto its image (vx is the inward 
normal vector). Let N(s) := K{dM x [0, s\) be the collar set for < s < rc. 

• (IC) The injectivity radius injg^^ of dM is positive. 

• (I) Injectivity radius of M: There is ri > so that ifr < ri then for x G M\N{r) the 
exponential map is a diffeomorphism on Bm{^,t) C T^M so that normal coordinates 
are defined on every ball BM{x,r) for x G M\N{r). 

• (B) Curvature bounds: For every A; G N there is so that |V*i?| < Cfc and V*n < 
Ck for < i < k, where V* denotes the covariant derivative of order i. 

Note that (B) imposes bounds on all orders of the derivatives of the curvatures. One could 
also restrict the definition to the order of derivatives needed for the goals one pursues. But 
this would r equire even more notational effort, therefore we skip this, in particular, in 
Schickl h99d ) it is argued that boundedness of all derivatives of the curvature is very close 



to the boundedness of the curvature alone. 

The lower bound on the injectivity radius of M and the bound on the curvature are 
standard to define manifolds of bounded geometry without boundary. Now the problem 
of the injectivity radius of M is that at the boundary it somehow makes only partially 
sense since inj^j(x) — > as d{x,dM) — > 0. Therefore one replaces next to the boundary 
standard normal coordinates with normal collar coordinates. 
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Definition 10 (normal collar coordinates) Let M he a Riemannian manifold with 
boundary DM. Fix x' S dM and an orthonormal basis ofT^idM to identify T^/dM with 
]^m-i_ ri,r2 > sufficiently small (such that the following map is infective) define 
normal collar coordinates, 

n^i : 5]Rm-i(0,ri) x [0,r2] ^M:{v,t)^ exp^paM(^) (ti^). 
The pair (ri,r2) is called the width of the normal collar chart n^i . 

The next proposition shows why manifolds of bounded geometry are especially interesting. 



Proposition 11 (|Schickl (|200lh ^ A ssume that conditions {N), (IC), (/) of Definition{^ 
hold. 

• (Bl) There exist < Ri < rinj(9M), < R2 ^ fc and < i?3 < rj and constants 
Cr > for each K ^'H such that whenever we have normal boundary coordinates 
of width (ri,r2) with ri < Ri and r2 < R2 or normal coordinates of radius r^ < ri 
then in these coordinates 

\D'^gij\ < Ck and \D°'g'^\ < Cr forall \a\ < K. 



The condition {B) in Definition\^ holds if and only if {Bl) holds. The constants Cr can 
he chosen to depend only on rj,rc,injg^^ and Ck- 

Note that due to g^^gjk = ^1 one gets upper and lower bounds on the operator norms of 
g and g~^, respectively, which result in upper and lower bounds for ^/det g. This implies 
that we have upper and lower bounds on the volume form dV{x) = ^Jdeigdx. 



Lemma 12 (ISchickl ( 120011 )) Let {M,g) be a Riemannian manifold with boundary of 
bounded geometry of dimension m. Then there exists Rq > and constants 5*1 > and 
S2 such that for all x £ M and r < Rq one has 

Sir"^ <voliBM{x,r)) < 52r'" 

Another important tool for analysis on manifolds are appropriate function spaces. In order 
to define a Sobolev norm one first has to fix a family of charts Ui with M C UjC/j and then 
define the Sobolev norm with respect to these charts. The resulting norm will depend 
on the choice of the charts Ui. Since in differential geometry the choice of the charts 
should not matter, the natural question arises how the Sobolev norm corresponding to a 
different choice of charts Vi is related to that for the choice Ui. In general, the Sobolev 
norms will not be the same. However, if one assumes that the transition maps are smooth 
and the manifold M is compact then the resulting norms will be equivalent and therefore 
define the same topology. Now if one has a non-compact manifold this argumentation 
does not work anymore. This problem is solved in general by defining the norm with 
respect to a covering of M by normal coordinate charts. Then it can be shown that the 
change of coordinates between these normal coordinate charts is well-behaved due to the 
bounded geometry of M. In that way it is possible to establish a well-defined notion of 
Sobolev spaces on manifolds with boundary of bounded geometry in the sense that any 
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norm defined with respect to a different covering of M by normal coordinate charts is 
equivalent. Let {Ui,4>i)i£i be a countable covering of the submanifold M with normal 
coordinate charts of M, that is M C Ujg/?7j, then: 

ll/llc"=(M) = maxsup sup |L>'"(/ o 0ri)(x)|. 

m<k i(zj x<^(j>i{Ui) 

In the following we win denote with C^{M) the space of C'^-functions on M together with 
the norm |Hlcfc(A/)- 



5.1.2 Intrinsic versus Extrinsic Properties 

Most of the proofs for the continuous part will work with Taylor expansions in normal 
coordinates. It is then of special interest to have a connection between intrinsic and ex- 
trinsic distances. Since the distance on M is induced from W^, it is obvious that one has 
11^ ~ v\\9.<i dMix.y) for al l x,?/ £ M which are sufficiently close. The next proposition 
proven by Smolvanov et al. ( 200d ) provides an asymptotic expression of geometric quan- 



tities of the submanifold M in the neighborhood of a point x G M. Particularly, it gives a 
third-order approximation of the intrinsic distance dMix, y) in M in terms of the extrinsic 
distance in the ambient space X which is in our case just the Euclidean distance in M*^. 

Proposition 13 Let i : M ^ be an isometric embedding of the smooth m-dimensional 
Riemannian manifold M into W^. Let x € M and V be a neighborhood ofO in M*" and let 
^ : y — > [/ provide normal coordinates of a neighborhood U of x, that is ^'(0) = x. Then 
for all y GV : 

WvWlr^ = dli{x, ^{y)) = \\{io ^)(y) - z(x)f + ^ 110(7,7)11^^^^ + 0(11^11^™), 

where 11 is the second fundamental form of M and 7 the unique geodesic from x to ^{y) 
such that 7 = y^dyi. The volume form dV = \J det gij (y) dy of M satisfies in normal 
coordinates, 

dV= (l + ii?,„„yV + 0(||y||;^„))dy, 

in particular 

(A^det gi,){0) = -^R, 
where R is the scalar curvature (i.e., R = g^^ g^^ Riju) ■ 



We would like to note that in I Smolvanov et al.l (j2007l ) this proposition was formulated for 



general ambient spaces X, that is arbitrary Riemannian manifolds X. Using the more 
general form of this proposition one could extend the results in this paper to submanifolds 
of other ambient spaces X. However, in order to use the scheme one needs to know the 
geodesic distances in X, which are usually not available for general Riemannian manifolds. 
Nevertheless, for some special cases like the sphere, one knows the geodesic distances. Sub- 
manifolds of the sphere could be of interest, for example in geophysics or astronomy. 
The previous proposition is very helpful since it gives an asymptotic expression of the 
geodesic distance dM{x,y) on M in terms of the extrin sic Euclidean di s tance . The fol- 
lowing lemma is a non-asymptotic statement taken from Bernstein et al. ( 200ll ) which we 
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present in a slightly different form. But first we establish a connection between what they 
call the 'minimum radius of curvature' and upper bounds on the extrinsic curvatures of 
M and dM. Let 

Ilmax = sup sup ||n(t;, || , Ilmax = SUp SUp ||n(T;, ti) || , 

x&M v&TxM,\\v\\=l x€dM v&TxdM,\\v\\=l 

where 11 is the second fundamental form of dM as a submanifold of M. We set rimax = 
if the boundary dM is empty. 

Using the relation between the acceleration in the ambient space and the second funda- 
mental form for unit-speed curves 7 with no acceleration in M {Dtj = 0) established in 
section lA.31 we get for the Euclidean acceleration of such a curve 7 in M'^, 

II7II = l|n(7,7)|| . 

Now if one has a non-empty boundary dM it can happen that a length-minimizing 
curve goes (partially) along the boundary (imagine with a ball at the origin cut out). 
Then the segment c along the b oundary will be a geodesic of the submanifold dM, see 
Alexander and Alexander ( 198ll ). that is Dtc = VcC = where V is the connection of dM 



induced by M. However, c will not be a geodesic in M (in the sense of a curve with no 
acceleration) since by the Gauss- Formula in Theorem | 



Dtc = Dtd + U{c,c) = n(c,c). 

Therefore, in general the upper bound on the Euclidean acceleration of a length-minimizing 
curve 7 in M is given by, 

II7II = ||n(7,7) + n(7,7)|| <n„iax + nmax- 

Using this ir iequality, one can deriv e a lower bound on the 'minimum radius of curvature' 
p defined in Bernstein et al. ( 200ll ) as p = inf{l/ ||7||]g<i} where the infimum is taken over 
all unit-speed geodesies 7 of M (in the sense of length- minimizing curves): 

1 



nmax ~l~ nmax 

Finally we can formulate the Lemma from Bernstein et al. ( 200ll ). 
Lemma 14 Let x,y ^ M with dMix,y) < up. Then 

2psm{dM{x,y)/{2p)) < \\x - y\\^d < dM{x,y). 
Noting that sin(x) > x/2 for < x < 7r/2, we get as an easier to handle corollary: 
Corollary 15 Let x,y £ Al with dM{x,y) < vrp. Then 

\iM{x,y) < \\x-y\\^d <dM{x,y). 
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In the given form this corohary is quite useless since we only have the Euclidean distances 
between points and therefore we have no possibility to check the condition dMix, y) < vr/?. 
In general small Euclidean distance does not imply small intrinsic distance. Imagine a 
circle where one has cut out a very small segment. Then the Euclidean distance between 
the two ends is very small however the geodesic distance is very large. We show now that 
under an additional assumption one can transform the above corollary so that one can use 
it when one has only knowledge about Euclidean distances. 

Lemma 16 Let M have a finite radius of curvature p > 0. We further assume that, 

K := inf inf \\x — y\\ , 

xeM y£M\BM(x,np) 

is non-zero. Then B^d{x, k/2) n M C Bm{x,k) C i?A/(x,7rp). Particularly, if x,y £ M 
and \\x — y\\ < k/2, 

-dM{x,y) < \\x-y\\^d < dM{x,y) < k. 

Proof: By definition k is at most the infimum of ||x — y\\ where y satisfies dj^fix, y) = Trp. 
Therefore the set B^d{x, k/2) n M is a subset of Bm{x,tip). The rest of the lemma then 
follows by Corollary 1151 Figure [5] illustrates this construction. □ 




Figure 5: k is the Euclidean distance of x G M to M\Bm{x, Trp). 



5.2 Notations and Assumptions 

In general we work on complete non-compact manifolds with boundary. Compared to a 
setting where one considers only compact manifolds one needs a slightly larger technical 
overhead. However, we will indicate how the technical assumptions simplify if one has a 
compact submanifold with boundary or even a compact manifold without boundary. 
We impose the following assumptions on the manifold M: 

Assumption 17 [(i)] 

The map i : M ^ is a smooth embedding, 

2. The manifold M with the metric induced from is a smooth manifold with boundary 
of bounded geometry (possibly dM = 

3. M has bounded second fundamental form. 
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4- It holds K := inf^gM iiifj^eM\_BA/(2-','rp) ll^(^) ~ ^ ^> where p is the radius of 

curvature defined in Section \5.1.2l 

5. For any x G M\dM , 5{x) := inf IHix) — i{y)\\ud > 0, where 

y£M\BM{x,\ min{inj{a;),7rp}) 

inj(x) is the injectivity radius^ at x and p > is the radius of curvature. 

The first condition ensures that i{M) is a smooth submanifold of M*^. Usually we do not 
distinguish between i{M) and M. The use of the abstract manifold M as a starting point 
emphasizes that there exists an m-dimensional smooth manifold M or roughly equivalent 
an m-dimensional smooth parameter space underlying the data. The choice of the d 
features determines then the representation in W^. The choice of features corresponds 
therefore to a specific choice of the inclusion map i since i determines how M is embedded 
into M.'^. This means that another choice of features leads in general to a different mapping 
i but the initial abstract manifold M is always the same. However, in the second condition 
we assume that the metric structure of M is induced by R*^ (which implies that i is trivially 
an isometric embedding). Therefore the metric structure depends on the embedding i or 
equivalently on our choice of features. 

The second condition ensures that M is an isometric submanifold of M.'^ which is well- 
behaved. As discussed in section [5.1. 11 manifolds of bounded geometry are in general non- 
compact, complete Riemannian manifolds with boundary where one has uniform control 
over all intrinsic curvatures. The uniform bounds on the curvature allow to do reasonable 
analysis in this general setting. In particular, it allows us to introduce the function spaces 
C^{M) with their associated norm. It might be possible to prove pointwise results even 
without the assumption of bounded geometry. But we think that the setting studied 
here is already general enough to encompass all cases encountered in practice. The third 
condition ensures that M also has well-behaved extrinsic geometry and implies that the 
radius of curvature p is lower bounded. Together with the fourth condition it enables us to 
get global upper and lower bounds of the intrinsic distance on M in terms of the extrinsic 
distance in and vice versa, see Lemma [T6l The fourth condition is only necessary in the 
case of non-compact submanifolds. It prevents the manifold from self-approaching. More 
precisely it ensures that if parts of M are far away from x in the geometry of M they do 
not come too close to x in the geometry of M"^. Assuming that i{M) is a submanifold, this 
assumption is already included implicitly. However, for non-compact submanifolds the self- 
approaching could happen at infinity. Therefore we exclude it explicitly. Moreover, note 
that for submanifolds with boundary one has inj(x) — > as x approaches the boundary^ 
dM. Therefore also 5{x) ^ as d{x, dM) 0. However, this behavior of 6{x) at the 
boundary does not matter for the proof of pointwise convergence in the interior of M. 
Note that if M is a smooth and compact manifold conditions (ii)-(v) hold automatically. 
In order to emphasize the distinction between extrinsic and intrinsic properties of the ma- 
nifold we always use the slightly cumbersome notations x E M (intrinsic) and i{x) E M.'^ 
(extrinsic). The reader who is not familiar with Riemannian geometry should keep in 
mind that locally, a submanifold of dimension m looks like M™". This becomes apparent if 
one uses normal coordinates. Also the following dictionary between terms of the manifold 

^Note that the injectivity radius inj(a;) is always positive. 

^This is the reason why one replaces normal coordinates in the neighborhood of the boundary with 
normal collar coordinates. 
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M and the case when one has only an open set in M'^ {i is then the identity mapping) 
might be useful. 



Manifold M 


open set in W'' 


Qij , Vdetfi- 
natural volume element 

A, 


^ij , 1 
Lebesgue measure 



The kernel functions which are used to define the weights of the graph are always functions 
of the squared norm in W^. Furthermore, we make the following assumptions on the kernel 
function k: 

Assumption 18 [(i)] 

k : M^- ^ R is measurable, non-negative and non-increasing on R^, 

2. k €z C^(M^), that is in particular k, ^ and are bounded, 

3. k, I III and have exponential decay: 3c, a, A € R+ such that for any t > A, 
fit) < ce-"*, where f{t) = max{fc(t), ||f |(t), |0|(i)}, 

4. fc(o) = 0. 

The assumption that the kernel is non-increasing could be dropped, however it makes the 
proof and the presentation easier. Moreover, in practice the weights of the neighborhood 
graph which are determined by k are interpreted as similarities. Therefore the usual choice 
is to take weights which decrease with increasing distance. The fourth condition implies 
that the graph has no loops^. in particular, the kernel is not continuous at the origin. 
All results hold also without this condition. The advantage of this condition is that some 
estimators become unbiased. Also let us introduce the helpful notation, kh{t) = j^k (^) 
where we call h the bandwidth of the kernel. Moreover, we define the following two 
constants related to the kernel function k, 

Ci= [ k{\\yf)dy < oo, C2 = [ k{\\yf)yjdy < 00. (13) 

We also have some assumptions on the probability measure P. 
Assumption 19 [(i)] 

P is absolutely continuous with respect to the natural volume element dV on M , 

2. the density p fulfills: p e C^{M) and p{x) > 0, V x € M\dM, 

3. the sample Xi, i = 1, . . . ,n is drawn i.i.d. from P, 

Note that condition (i) implies P[dM) = 0, that is the boundary dM is a set of measure 
zero. We will call the Assumptions [T7] on the submanifold. Assumptions [18] on the ker- 
nel function, and Assumptions [TO] on the probability measure P together the standard 
assumptions. 

In the following table we summarize the notation used in the proofs: 
''An edge from a vertex to itself is called a loop. 
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h>0 
m G N 

A G M 

dh,n{x) = h EiLl kh{\\x-Xi 

kU\\x-Xif) 



kx,h{x,Xi 

{Ax,h,nf){x) = 



7^T:=ikx,h{x,X,)fiX, 



kernel function 
neighborhood/bandwidth parameter 
dimension of the submanifold M 
scaled kernel function 
reweighting parameter 

degree function associated with k 

reweighted kernel 

degree function associated with kx^^ 

empirical average operator Ax^h,n 



A 



(rw) _ 
(u) 



f — Tsi f 7 ^ -Ax^h,nf 



^X,h,nf - ■^('^A,h,n/ - Ax^h,nf) 



1 



V<i\.h. 



-A 



X,h,n 



random walk graph Laplacian 

unnormalized graph Laplacian 
normalized graph Laplacian 



Ci = 4™ k{\\y\\l,,^)dy, C2 = /jg^ k{\\y\\l^)yldy 
Phix)=Ezkhi\\x-Zf) 
{Ax,hf){x)=Ezkx,h{x,Z)f{Z) 

A (™) A ('^^ A (°) 
^A,/i ' ^A./i.' ^X,h 



As = ^ div(p'' grad) 



1 „ab 



characteristic constants of the kernel 
convolution of p with k^ 
average operator Ax^h 
Laplacians associated with Ax^h 
s-th weighted Laplacian on M 



5.3 Asymptotics of Euclidean Convolutions on the Submanifold M 

The following proposition describes the asymptotic expression of the convolution of a func- 
tion / on the submanifold M with a kernel function having the Euclidean distance ||x — y|| 
as its argument with respect to the probability measure P on M. This result is interesting 
since it shows how the use of the Euclidean distance introduces a curvature effect if one 



averages a function locally. A similar result has been presented in ICoifman and Lafon 
We define the density p invariantly with respect to the natural volume element 
and also explicitly give the second order curvature terms. Our proof is similar to that of 
Smolvanov et al.l (|2007l ) where under stronger conditions a similar result was proven for 
the Gaussian kernel. The more general setting and the use of general kernel functions 
make the proof slightly more complicated. In order to emphasize the distinction between 
extrinsic and intrinsic properties of the manifold we will use the slightly cumbersome 
notations x G M (intrinsic) and i{x) G (extrinsic). 

Proposition 20 Let M and k satisfy Assumptions^^ and\18[ Furthermore, let P have 
a density p with respect to the natural volume element and p G C^{M). Then, for any 
X G M\dM , there exists an ho{x) > such that for all h < ho{x) and any f G C^{M), 



M 



kh{ \\i{x) - i(.y)\\'id )f{y)p{y) V det g dy 



--Cip{x)f{x) + y C2[p{x)f{x)S{x) + {^M{pf)){x) ) + 0{h^-^ 
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where 0{h^) is a function depending on x, ||/||c3(jv/) '^"■^ II?'IIc3(m) '^'^^ 



S{x) 



R\ 



where R is the scalar curvature and U the second fundamental form of M . 

The following Lemma is an application of Bernstein's inequality. Together with the pre- 
vious proposition it will be the main ingredient for proving consistency statements for the 
graph structure. 

Lemma 21 Suppose the standard assumptions hold and let the kernel k have compact 



support on [0,R^]. Define bi 



CO II J II oo ' 



K 



where K is a constant depend- 



ing on \\p\\^, \\k\\^ and Rk- Let x E M\dM and Vi := kh{\\i{x) - i{Xi)\\^)f{Xi). Then 
for any bounded function f , 



I 1 " / 
p(|-^yi-Ey >e)<2exp( 



nh^e^ 



1=1 

|2^ 



V 262 + 26ie/3^ 

Let = kh{\\i{x) - i{Xi)f){f{x) - f{Xi)). Then for hR^ < k/2 and f G C\M), 

nh'^e^ 



Pf|i VWj -EVF 
VI n ^-^ 



i=l 



> he] < 2 exp 



2b2 + 25ie/3 



Proof: Since by assumption k > 0, by Lemma [T6l for any x,y £ M with \\i{x) — i{y)\\ < 
k/2, we have dM{x,y) < 2\\i{x) — i{y)\\. This implies Va < k/2, B^d{x,a) n M C 
BM{x,2a). 

Let Wi := khiWiix) - i{X^)f)f{X^). We have 



\WA < 



sup 



\fiy)\ < 



loo II ^|l ""1 
loo ■ 



For the variance of W we have two cases. First let hR^ < s := min{K/2, Ro/2}. Then we 
get 



Varir < Kzklm^) " i{Z)f)f\Z) < 



\k\ 



\k\ 



where we have used Lemma |43] in the last step. Now consider hRk > s, then 

1 1 1 1 2 Tjm II r„ 1 1 2 



VarVF < 



h^ 



Rl 



00 II fW^ ^ 

00 - smj^m 



00 II ^||2 

00 



Therefore we define b2 = K 
stein's inequality we finally get 



^ withK = i?-||fc||^ 



max{2'"52 HpIL > By Bern- 



P[\Ti^=lW^-EW\>e] < 2 



' 257+257773 
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Both constants 62 and bi are independent of x. For the second part note that by Lemma 
[T6]for hRk < /t/2, we have that \\x — y\\ < hR^ imphes dMix,y) <2\\x — y\\ < 2hRk. in 
particular, for ah x,y € M with ||x — y\\ < hRk, 

\f{x) - f{y)\ < sup \\Vf\\^^dM{x,y) < 2hRu sup ||V/||^ ,f • 

yeM " yeM " 

A similar reasoning as above leads then to the second statement. □ 
Note that ^zKmx) - i{Z)f)f{Z) = f khiUx) - i{y)f)f{y)p{y)Vd^dy. 

M 

5.4 Pointwise Consistency of the Random Walk, Unnormalized and Nor- 
malized Graph Laplacian 

The proof of the convergence result for the three graph Laplacians is organized as follows. 
First we introduce the continuous operators A^'^^^ ^a^I ^^'^ ^a"1- Then we derive the 
limit of the continuous operators as /i ^ 0. This part of the proof is concerned with 
the bias part since roughly (AA,/t/)(x) can be seen as the expectation of Ax^h,nf{x). 
Second we show that with high probability all extended graph Laplacians are close to the 
corresponding continuous operators. This is the variance part. Combining both results 
we arrive finally at the desired consistency results. 



5.4.1 The Bias Part - Deviation of A\ h from its Limit 



The following continuous a pproximation of A^'^^ was similarly introduced in lLafonl(|2nn4l ): 
Coifman and LafonI (|2006l ^. 



Definition 22 (Kernel-based approximation of the Laplacian) We introduce the 
following averaging operator Ax h based on the reweighted kernel kx h ■ 



{Ax,hf){x)= kx,h{x,y)fiy)piy)Vd^dy, (14) 



IM 

and with dx^h = (^A,/il) ihe following continuous operators: 



random walk : A-;'/ := ^ ^/ - j-A,,f J = _ | (.), 



unnormalized : a("^/ := j^{dx,hf - ^\,hfj = ■j^{A\,h9){x), 

normalized : A^"^/ := \=(dx,h L— - ^\,h L- ) = -—j^=={Ax,h9')ix), 



1 



where we have introduced again g{y) := f{x) — f{y) and g'{y) := /S^'^ /S'^^ 

The definition of the normalized approximation A^'^^'* can be justified by the alternative 
definition of the Laplacian in sometimes made in physics textbooks: 

(A/)(:e) = lim --i^(/(x) - — ^ / f{y)dy 

r^O Cdr^\ \o\[B{x,r)) JB{x,r) 



27 



where Cd is a constant depending on the dimension d. 

Approximations of the Laplace-Beltrami operator based on averaging with the Gaussian 
kernel in the case of a uniform probabihty measure have been stu died for compact sub- 



Kernei m tne case or a uniiorm probabinty measure nave Deen stu aiea lor compact suD- 
manifolds without boundary b y ISmolvanov et al. (|2000l . l2007l ) andlBelkid (So3). Their 
result was then generalized by iLafonl (|2004l ) to general densities and to a wider class of 
isotropic , posit i ve de finite kernels for compact submanifolds with boundary. The proof 



Lafon ( 2OO4I ) applies only to compact hypersurfaces^ in R'^, a proof for the gen- 



given m 

eral case of compact submanifolds wit h boundary using boundary conditions has been 
presented in Coifman and Lafon ( 20061 ). In this section we will prove the pointwise con- 
vergence of the continuous approximation for general submanifolds M with boundary of 
bounded geometry with the additional Assumptions 1 1 71 This includes the case where M is 
not compact. Moreover, no assumptions of positive definiteness of the kernel are made nor 
any boundary condition on the function / is imposed. Almost any submanifold occurring 
in practice should be covered in this very general setting. 

For pointwise convergence in the interior of the manifold M boundary conditions on / 
are not necessary. However, for uniform convergence there is no way around them. Then 
the problem lies not in the proof that the continuous approximation still converges in the 
right way but in the transfer of the boundary condition to the discrete graph. The main 
problem is that since we have no information about M apart from the random samples the 
boundary will be hard to locate. Moreover, since the boundary is a set of measure zero, 
we will actually almost surely never sample any point from the boundary. The rigorous 
treatment of the approximation of the boundary respectively the boundary conditions of 
a function on a randomly sampled graph remains as an open problem. 
Especially for dimensionality reduction the case of low-dimensional submanifolds in R*^ is 
important. Notably, the analysis below also includes the case where due to noise the data 
is only concentrated around a submanifold. 

Theorem 23 Suppose the standard assumptions hold. Furthermore, let k he a kernel with 
compact support on [0,i?^]. Let A € R, and x € M\dM . Then there exists an hi{x) > 
such that for all h < hi{x) and any f £ C'^{M), 

(A(7^/)(x) = - ^((Am/)(x) + ^ {Vp,Vf)^^,,) + 0{h) = -^(A,/)(x) + 0{h)., 
where Am is the Laplace-Beltrami operator of M and s = 2(1 — A). 



Proof: For sufficiently small h we have B^d{x,2hRk) H M Ci dM = 0. Moreover, it 
can be directly seen from the proof of Proposition [20] that the upper bound of the in- 
terval [0, /io(y)] for which the expansion holds depends continuously on S{x) and e{y), 
where e{y) = g minjT r/O, inj (?/)). Now /in (a: ) is continuous since inj(x) is continuous on 
compact subsets, see iKlingenbereJ (| 19821 ) [Prop. 2.1.10], and 6{x) is continuous since 
the injectivity radius is continuous. Therefore we conclude that since /io(2/) is contin- 
uous on B(x,2hRi^)nM and ho{y) > 0, /ii(x) = ini^^j^ ^(x 2/ii?i.)nM 

ho{y) > 0. Then 

for the interval (0,/ii(x)) the expansion of Phiu) holds uniformly over the whole set 
2/ii?fc) n M. That is, using the definition of k as well as Proposition [20] and the 



A hypersurface is a submanifold of codimension 1. 



28 



expansion ^^^_^_l■2h)X = ^ " + Oih^), we get for h £ (0, hi{x)) that 



M 



kx,h{\\iix) - iiy)\\ )f{y)p{y)yjdeigdy 



kh{\\i{x)-i{y)\Y 



B^{x,hRf,)r\M 



-fiy) 



Cipjy) - X/2C2h\p{y)S + Ap) 



Vdetgdy, 



where the 0(/i^)-term is continuous on B^d{x,hR}^) and we have introduced the abbrevi- 
ation S = \[-R + \ ||Ean(9a,5a)||l jjd]. Using f{y) = 1 we get, 



d\h{x) 



kH{\\i{x) - t{y)f ) 
pI{x) 



Cipjy) - X/2C2h\p{y)S + Ap) 
C^+^p{yY 



+ 0{h 



3\ 



^/detgdy, 



as an estimate for d\^h{x)- Now using Proposition [20] again, we arrive at: 



aS'/ = h '^^'-'i^y = -i^(AM/ + ^ (VP, V/)) + 0(K), 

where ah 0(/i)-terms are finite on B^d{x, hRk) H M since p is strictly positive. □ 

Note that the hmit of A^'^^^ has the opposite sign of A^. This is due to the fact that the 
Laplace-Beltrami operator on manifolds is usually defined as a negative definite operator 
(in analogy to the Laplace operator in M*^), whereas the graph Laplacian is positive def- 
inite. But this varies through the literature, thus the reader should be aware of the sign 
convention. 

Remark: The assumption of compact support of the kernel k is only necessary in the case 
of non-compact manifolds M . For compact manifolds a kernel with non-compact support, 
such as a Gaussian kernel, would work, too. The reason for compact support of the kernel 
comes from the fact that for non-compact manifolds there exists no lower bound on a 
strictly positive density. This in turn implies that one cannot upper bound the convolution 
with the reweighted kernel if one does not impose additional assumptions on the density. 
In practice the solution of graph-based methods for large-scale problems is usually only 
possible for sparse neighborhood graphs. Therefore the compactness assumption of the 
kernel is quite realistic and does not exclude relevant cases. With the relations 

(Al';i^J)(:r)=4,,„(x)(AiXl/)(x) 

one can easily adapt the last lines of the previous proof to derive the following corollary. 

Corollary 24 Under the assumptions of Theorem\23[ Let A G M and x G M\dM . Then 
there exists an hi{x) > such that for all h < hi{x) and any f G C^{M), 



{Ai^lf){x) = -^p{x)'-^\AJ){x)+0{h), where s = 2(l-A), 



'1 



= -^p(x)^^A, (x) + 0{h) 
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5.4.2 The Variance Part - Deviation of Ax^h,n from Ax^h 

Before we state the results for the general case with data-dependent weights we now 
treat the case A = 0, that is we have non-data-dependent weights. There the proof is 
considerably simpler and much easier to follow. Moreover, as opposed to the general case 
here we get convergence in probability under slightly weaker conditions than almost sure 
convergence. Since this does not hold for the normalized graph Laplacian in that case we 
will only provide the general proof. 



Theorem 25 (Weak and strong pointwise consistency for A = 0) Suppose the stan- 



dard assumptions hold. Furthermore, let k he a kernel with compact support on [0, 



Let X £ M\dM and f G C%M). Then if h ^ and nh'^+^ oo, 

lim {A^™'' f){x) = (A2/)(x) in probability, 

c 

lim {A'qI f){x) = — ^p{x){A2f){x) in probability. 

If even n/i™^^/ log n oo, then almost sure convergence holds. 

Proof: We give the proof for Aq™^^. The proof for Aq"^ ^ can be directly derived with the 
second statement of Lemma [2T] for the variance term together with Corollar yl24lfor the bias 



term. Similar to the proof for the Nadaraya- Watson regression estimate of lGreblicki et al 
( 19841 ). we rewrite the estimator A^^^^^f in the following form 



{Co,h f)ix) + Bin 



1 + B- 



2n 



(15) 



where 

(C^o,h/)(^) 



W.zkh{\\i{x)-i{Z)f)g{Z) 
^zkhmx)-i{Z)\\^) 



B 



h T:U f^^mx) - i{X,)f)giX,) - Ez kniim - ^{Z)f)g{Z) 



In 



Ez kh{\\i{x) - i{Z)f) 

2\ TW 7 /II-/ N •/'7M|2\ 



EzkMx)-mt) 

with g{%j) := f{x) — f{y). In Theorem 1231 we have shown that for x G M\dM, 

lim(Af;:)/)(x) = Ihn i^(Co,,5)(x) = -^^{A,f){x). (16) 

Using the lower bound of p/i(x) = Ez kh{\\i{x) — i{Z)\\^) derived in Lemma H3] we can for 
hRk < 't/2 directly apply Lemma [2TJ Thus there exist constants di and d2 such that 



P( l-Binl >h^t)<2 exp 



n 



2||A:|L(d2 + t^di/3) 



The same analysis can be done for B2n- This shows convergence in probability. Complete 
convergence (which implies almost sure convergence) can be shown by proving for all t > 
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the convergence of the series 
n/i™"'"^/ log n ^ oo as n ^ oo. 



I -Bin I ^ h'^i) < oo. A sufficient condition for that is 

□ 



The weak pointwise consistency of the unnormahzed graph Laplacian for compact sub- 
manifolds with the unifor m probability mea s ure us ing the Gaussian kernel for the weights 
and A = wa s proven by 



pendent ly in (jHein et al 



Belkin and Nivoeil ^200^ ). A more general result appeared inde- 



20051 ) . We prove here the limits of all three graph Laplacians for 



general submanifolds with boundary of bounded geometry, general probability measures 
P, and general kernel functions k as stated in our standard assumptions. 
The rest of this section is devoted to the general case A 7^ 0. We show that with high 
probability the extended graph Laplacians Ax^h,n are pointwise close to the continuous op- 
erators Ax^h when applied to a function / E C^(M). The following proposition is helpful. 



Proposition 26 Suppose the standard assumptions hold. Furthermore, let k he a kernel 
with compact support on [0, Fix A G M and let x S M\dM, f £ C^{M) and define 

< € < 1/C, 



2\\k\\ 



9{y) •= f{x) ~ fin)- Then there exists a constant C such that for any 
< h < following events hold with probability at least 1 — C ne c ^ 

\{^\Xn9){x) - {Ax^hg){x)\ <eh, \dx,h,n{^) - dx,h{^)\ < e. 

Proof: The idea of this proof is to show that several empirical quantities which can be 
expressed as a sum of i.i.d. random variables are close to their expectation. Then one can 
deduce that also {Ax^h,ng){x) will be close to {Ax^h9){x). The proof for dx^h.n can then be 
easily adapted from the following. We consider here only A > 0, the proof for A < is 
even simpler. Consider the event £ for which one has 

for any j G {1, . . . \dh^n{Xj) - Ph{Xj)\ < e 
dh,ni^) -Ph{x)\ <e 



< he 



We will now prove that for sufficiently large C the event £ holds with probability at least 
\ — Cue c . For the second assertion defining £, we use Lemma [21] 



P(|4,n(x)-P.(x)|>0<2exp^ 262 + 26W3. 

where hi and 62 are constants depending on the kernel k and p. For the first term in the 
event £ remember that A;(0) = 0. We get for < e/2 and 1 < j < n, 

P( 1^ Er=i kMX,) - ^{XM') - Ph{x)\ > e < 2exp(- 
This follows by 

\-Y^kumx,)-t{x.)\\^)-Ph{xA<\- — -Y,khmx,)-t{x,)\ 



i=l 



1=1 



+ 



n 



kMXj)-iix,)f)-p^ix,) 
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where the first term is upper bounded by . First integrating wrt to the law of Xj 
(the right hand side of the bound is independent of Xj) and then using a union bound, 
we get 

p(^for any j £{!,.. . ,n}, \dh,„{Xj) - ph{Xj)\ < > 1 - 2nexp (^- glj'+lbj/s ) ■ 

Noting that pf^(^x)ph{y) upper bounded by Lemma 143) we get by Lemma [2T] for hRk < k/2 
a Bernstein type bound for the probability of the third event in £. Finally, combining 
all these results, we obtain that there exists a constant C such that for h < tttt- and 

2\\k\\ ^ nh™-s^ 

l^i^m < £ < 1, the event £ holds with probability at least 1 — Cne c . Let us define 
( B := /^,fc^(||i(x)-i(y)f)(/(x)-/(y))[p^(x)p/,(y)]-Vy)Vdit^c?y 

1 ^ := ^E-=l^h(||i(x)-i(X,)f)(/(x)-/(X,))[4,n(x)4,n(X,)]-' 

then {Ax^h,ng)ix) = B and [Ax^hg){x) = B. Let us now work only on the event £. By 
Lemma [33] for any y £ B^d{x, hRk) H M there exist constants Di, D2 such that < Di < 
Ph{y) ^ D2. Using the first order Taylor formula of [x ^ we obtain that for any 

A > and a,b> (3, \a^^ - b^^\ < XP^^^^\a - b\. So we can write for e < Di/2, 

< X{Di - e)-^^-^\dh,n{x)dh,n{Xj) - Ph{x)ph{Xj)\ 
<2\{Di-e)-^^-\D2 + e)e:=Ce. 



{<ih.n{x)dh,„{Xj)) (ph(x)ph(Xj)) 



Noting that for hR^ < k/2 by Lemma [T6l dM{x, y) < 2hRk, Vy G B^d{x, hR^) H M, 



< ^E]=lkhmx)-^iX,)f)\fix)-fiX,)\Ce 

+ 1^ E ■=! kniUx) - i{Xj)f){f{x) - f{X,))[pUx)pH{X,)]-' - B 

< 2C\\k\\^RkSupy^M\\'^f\\TyM he + he 



We have proven that there exists a constant C > 1 such that for any < /i < 2^ and 



nh"" 

III I 



^A\KnQ){x) - 'yAx^hg){x) 



< C'"he, 



with probability at least 1 — Cne c . □ 

This leads us to our first main result for the random walk and the unnormalized graph 
Laplacian. 

Theorem 27 (Pointwise consistency of A^""^^ and A^"^ ^) Suppose the standard as- 
sumptions hold. Furthermore, let k be a kernel with compact support on [0,Rf,]. Let 
X G M\dM, A S M. Then for any f E C^(M) there exists a constant C such that for any 

2\\k\\ -„h™- + ^e^ 

j^f^m+i < € < 1/C , < h < /imax with probability at least 1 — Cne c ^ 



l(Al:ri/)(-)-(A(?/)(x)| <e, 
\iAifj)ix) - iAi-if)ix)\ < e. 



The upper bound on e is here not necessary but allows to write the bound more compactly. 
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Define s = 2(1 - A). Then if h ^ and n/i'"+V log n —>■ oo, 



£-(ASi/)W = -^(A./)(.) 
lim(All„/)W = -^pW'-- 

in particular, under the above conditions, 

(rw) r ^2 



(As/)(x) 



almost surely, 
almost surely. 



2Ci 



{AJ){x)] =0{h)+0 



log n 



a.s. . 



a.s. . 



The optimal rate for h{n) is h = 0((log n/n) '"^ 
Proof: In Equation [8] it was shown that 



where g{y) := f{x) — f{y). Since / is Lipschitz we can directly apply Proposition [26l so 
that for the unnormalized we get with probability 1 — C ne c , 



i(4in/)(^)-K:u)(x)i<e. 

For the random walk Laplacian A^™^^ we work on the event where \dx^h,n — ^A,h| ^ he, 

, m + 2 2 

where e < 2}idx^h- This holds by Proposition 1261 with probability 1 — Cue c . 
Moreover, note that by Lemmas 1161 and 1121 for hRj. < min{K;/2, i?o}) we have 



Ux,h9\ < 



I oo ir II oo 



2L{f)hRk. 



Using Proposition [26] for Ax^h,ng and the bounds of Ph{x) from Lemma 03 
|(AiTl/)(x)-(Ai™)/)(x) 



1 



{Ax,h,ng){3^) _ (A\,hg)(x) 



< 



1 f\{Ax,h,n9){x) - iAx,h9){x)\ 



< 



/l2 

2D. 



2A 



D, 



dx,h,n{x) 

Df 



+ {Ax,hg){x, 



\dx,h,n{x) - dx,h{x)\ 



I oo ir lloo 



dx,h,n{x)dx,h{x) 
2L{f)Rke:=Ce, 



with probabihty 1 — Cne c . By Theorem [231 and [Ml we have for s = 2(1 — A), 

C2 



(a1™V)(x) 



(Ai1/)(x) 



C2 



2Ci 



p(x)i-2^ (A,/)(x) 



(As/)(x) 



2C2^ 



<Ch. 
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Combining both results together with the Borel-Cantelli-Lemma yields almost sure con- 
vergence. The optimal rate for h{n) follows by equating both order terms. □ 

Using the relationship between the unnormalized and the normalized Laplacian the point- 
wise consistency can be easily derived. However, the conditions for convergence are slightly 



strong er since the Laplacian is applied to the function f I '\J dxji^r^. 

Theorem 28 (Pointwise consistency of A^"^ ^) Suppose that the standard assumptions 
hold. Furthermore, let k he a kernel with compact support on [0, Let x G M\dM , 

A G M. Then for any f G C'^(M) there exists a constant C such that for any 

1/C, < /i < /imax with probability at least 1 — C e~ 

(a(1j)(x)-(a();/)(x) 



< e < 



< e. 



Define s = 2(1 — A). Then if h ^ and n/i™"'" /log n oo, 



hm (A(t„/)(x) 



-P(^)^-'^A, 



/ 



Proof: We reduce the case of A^^l. n ^° ^^^^ °f n 



x) almost surely. 

. We work on the event where 



\dx,h,n{^) - dx^h{x)\ < h e, \dx,h,n{^i) - d\,h{^i)\ ^ Vi = l,...,n 

From Proposition [26] we know that this holds with probability at least 1 — C n e c 
Working on this event we get by a similar argumentation as in the proof of Theorem [27] 
that there exists a constant C such that 



(Ai:t/)(x) 



1 



ix) 



1 



dx,h,n{x)f{x) 



+Y,Kh{x,Xi)f{x, 



1=1 



dx,h('-c) dx,h,n{x) 

<C'e 



Noting that -j— is Lipschitz since / and dx h Lipschitz and upper and lower bounded, 

dx,h ' 

on M n B^d{x, hRk) one can apply Theorem 1271 to derive the first statement. The second 
statement follows by Corollarv I24[ □ 
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A Basic Concepts of Differential Geometry 

In this section we introduce the necessary basics of differential geometry, in particular 
normal coordinates and submanifolds in R'^, used in this paper. Note that the definition 
of the Riemann curvature tensor varies across t extbooks w hich can result in sign-errors. 



Throughout the paper we use the convention of [Led (ll997l ) 
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A.l Basics 



Definition 29 A d- dimensional manifold X with boundary is a topological (Haus- 
dorff) space such that every point has a neighborhood homeomorphic to an open subset 
of = {{x^ , . . . , x'^) G M'^jxi > 0}. A chart (or local coordinate system) {U^cj)) of a 
manifold X is an open set U C X together with a homeomorphism (f) : U ^ V of U onto 
an open subset V C W^. The coordinates {x^, . . . ,x'^) of (/){x) are called the coordinates of 
X in the chart {U,(f)). A -atlas A is a collection of charts 

A = U{iUo,,(pa),a£ I}, 

where I is an index set, such that X = Ua(^iUa and for any a,f3 ^ I the corresponding 
transition map 

is r-times continuously differentiable. A smooth manifold with boundary is a manifold 
with boundary with a -atlas. 

For more technical details behind the definition of a manifold with boundary we refer to 
Le3 \2mi ). Note that the boundary dM of M is a (d — l)-dimensional manifold without 



boundary. In textbooks one often only finds the definition of a manifold without boundary 
which can be easily recovered from the above definition by replacing with W^. The 
interior M\dM of the manifold M is a manifold without boundary. 

Definition 30 A subset M of a d-dimensional manifold X is a m-dimensional subman- 
ifold M with boundary if every point x ^ M is in the domain of a chart {U, 0) of X 
such that 

: C/nM^M'" X a, ^(a;) = . . . , x™, a\ . . . , a'^"'") 
where a is a fixed element in W^""^. X is called the ambient space of M. 

This definition excludes irregular cases like intersecting submanifolds or self-approaching 
submanifolds. In the following it is more appropriate to take the following point of view. 
Let M be an m-dimensional manifold. The smooth mapping i : M ^ X is said to be 
an immersion if i is differentiable and the differential of i has rank m everywhere. An 
injective immersion is called embedding if it is an homeomorphism onto its image. In 
this case i{M) is a submanifold of X. If M is compact and i is an injective immersion, 
then i is an embedding. This is not the case if M is not compact since i{M) can be 
self-approaching. 

Definition 31 A Riemannian manifold (M, g) is a smooth manifold M together with 
a tensor^ of type (0, 2), called the metric tensor g, at each p G M , such that g defines an 
inner product on the tangent space TpM which varies smoothly over M. The volume form 
induced by g is given in local coordinates as dV = \/det g dx^ A ... A dx^. dV is uniquely 
determined by dV{ei, . . . , em) = 1 for any oriented orthonormal basis ei, . . . , in T^M . 



'^A tensor T of type (m, n) is a multilinear form TpM x . . .TpM x T'M x . . . x T'M -> R (n-times 
TpM, m-times TpM). 
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The metric tensor induces for every p G M an isometric isomorphism between the tangent 
space TpM and its dual T*M. A submanifold M of a Riemannian manifold {X,g) has a 
natural Riemannian metric h induced from X in the following way. Let i : M ^ X he 
an embedding so that M is a submanifold of X. Then one can induce a metric h on M 
using the mapping z, namely h = i*g, where i* : T*^^-^X T*M is the pull-back'^ of the 
differentiable mapping i. In this case i trivially is an isometric embedding of {M,h) into 
{X,g). In the paper we always use on the submanifold M the metric induced from W^. 

Definition 32 The Laplace- Beltrami operator Am of a Riemannian manifold is de- 
fined as Aj\/ = div(grad). For a twice differentiable function f : M ^ M it is explicitly 
given as 

where g'^^ are the components of the inverse of the metric tensor g = gij dx^ ® dx^ . 



A. 2 Normal Coordinates 

Since in the proofs we use normal coordinates, we give here a short introduction. Intu- 
itively, normal coordinates around a point p of an m-dimensional Riemannian manifold 
M are coordinates chosen such that M looks around p like in the best possible way. 
This is achieved by adapting the coordinate line s to g eodes ies through the point p. The 
reference for the following material is the book of ljosti (|2002l '). We denote by the unique 



geodesic starting at c(0) = x with tangent vector c(0) = v (c^ depends smoothly on p and 
v). 

Definition 33 Let M he a Riemannian manifold, p G M, and Vp = {v £ TpM, c^ defined 
on [0,1]}, then, expp : Vp M, v i— > c^(l), is called the exponential map of M at p. 

It can be shown that expp maps a neighborhood of G TpM diffeomorphically onto a 
neighborhood U of p £ M. This justifies the definition of normal coordinates. 

Definition 34 Let U be a neighborhood of p in M such that exp^ is a diffeomorphism. 
The local coordinates defined by the chart (C/, exp~^) are called normal coordinates at 
P- 

Note that in TpM ~ D exp~^(C/) we use always an orthonormal basis. The injectivity 
radius describes the largest ball around p such that normal coordinates can be introduced. 



Definition 35 Let M be a Riemannian manifold. The injectivity radius of p £ M is 



inj(p) = sup{/9 > 0, expp is defined on i?igm(0, p) and injective}. 

^T*M is the dual of the tangent space T^M. Every differentiable mapping i : M ^ X induces a 
pull-back i* : T^*^)X T*M. Let u £ T^M, w € T*^^^^X and denote by i' the differential of i. Then i* is 
defined by {i*w){u) = wii'u). 
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It can be shown that inj(p) > 0,Vp G M\dM. Moreover, for compact manifolds with- 
out boundary there exists a fower bound inj^^j^ > such that inj(p) > inj^jj^jVp S M. 
However, for manifolds with boundary one has inj(p„) — > for any sequence of points p„ 
with limit on the boundary. The motivation for introducing normal coordinates is that 
the geometry is particularly simple in these coordinates. The following theorem makes 
this more precise. 

Theorem 36 In normal coordinates around p one has for the Riemannian metric g and 
the Laplace- Beltrami operator Am applied to a function f at p = exp~-'^(0), 



9^m=^^V 9^5^,(0) =0, (Am/)(0) = ^^(0) 



The second derivatives of the metric tensor cannot be made to vanish in general. There 
curvature effects come into play which cannot be deleted by a coordinate transforma- 
tion. To summarize, normal coordinates with center p achieve that, up to first order, the 
geometry of M at point p looks like that of R™'. 

A. 3 The Second Fundamental Form 

In this section we assume that M is an isometrically embedded submanifold of a manifold 
X. At each point p £ M one can decompose the tangent space TpX into a subspace TpM, 
which is the tangent space to M, and the orthogonal normal space NpM. In the same way 
one can split the covariant derivative of X at p, V^/F into a component tangent (VuV)'^ 
and normal {VijV)-^ to M. 

Definition 37 The second fundamental form U of an isometrically embedded sub- 
manifold M of X is defined as 

n : TpM TpM NpM, n{U, V) = (VuV)^ 

The following theorem, see Lee (jl997l ). then shows that the covariant derivative of M at 



p is nothing else than the projection of the covariant derivative of X at p onto TpM. 

Theorem 38 (Gauss Formula) Let U, V be vector fields on M which are arbitrarily 
extended to X , then the following holds along M 

VuV = VuV + n(c/, V) 

where V is the covariant derivative of X and V the covariant derivative of M . 
The second fundamental form connects also the curvature tensors of X and M. 
Theorem 39 (Gauss equation) For any U, V,W,Z £ TpM the following equation holds 

R{U, V, W, Z) = R{U, V, W, Z) - {Ii{U, Z), n(F, W)) + {Ii{U, W),Jl{V, Z)) , 
where R and R are the Riemann curvature^ tensors of X and M. 



The Riemann curvature tensor of a Riemannian manifold M is defined as R : TpM ® TpM TpM 



t;m, 



R{X,Y)Z = VxVyZ ~ 

■ijk 



In local coordinates x*, Rijk 'di — R{di,dj)dk and Rijkm ~ gimR. ' 
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In this paper we derive a relationship between distances in M and the corresponding 
distances in X. Since Riemannian manifolds are length spaces and therefore the distance 
is induced by length minimizing curves (locally the geodesies), it is of special interest to 
connect properties of curves of M with respect to X. Applying the Gauss Formula to a 
curve c{t) : (to;*i) ~^ M yields the following 

DtV = DtV + UiV,c), 

where Df = cf^V a and c is the tangent vector field to the curve c{t). Now let c{t) be a 
geodesic parameterized by arc-length, that is with unit-speed, then its acceleration fulfills 
Dtc = c°'VaC^ = (however that is only true locally in the interior of M, globally if 
M has boundary length minimizing curves may behave differently especially if a length 
minimizing curve goes along the boundary its acceleration can be non-zero), and one gets 
for the acceleration in the ambient space 

Dtc = U{c,c). 

In our setting where X = the term Dtc is just the ordinary acceleration c in M.'^. 
Remember that the norm of the acceleration vector is inverse to the curvature of the curve 
at that point (if c is parameterized by arc-length^). Due to this connection it becomes 
more apparent why the second fundamental form is often called the extrinsic curvature 
(with respect to X). 

The following Lemma shows that the second fundamental form 11 of an isometrically 
embedded submanifold M of M'^ is in normal coordinates just the Hessian of i. 

Lemma 40 Let Cq, a = 1, . . . ,d denote an orthonormal basis o/Tj(^)M°' then the second 
fundamental form of M in normal coordinates y is given as: 

n(9,M5,.)^ = ^-^e„. 

Proof: Let V be the flat connection of M*^ and V the connection of M. Then by Theo- 
remEll Il{^y^,^yJ) = Vi*g^^{i*dy3) -Vg^Myj = ^y^ [w-j da = 3y^'dyj (^a, where the second 



equality follows from the flatness of V and T\ 



jk 



in normal coordinates. □ 



B Proofs and Lemmas 

Proof of Proposition [201 

The following lemmas are needed in the proof. 

Lemma 41 // the kernel k : M-|_ satisfies Assumptions \18l then 



dk^ 
dx 



\u\\'^Wu^u''u^du 



1 



Co 



(17) 



Note that if c is parameterized by arc-length, c is tangent to AI, that is in particular |[c|| j, — \\c\\ 
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Proof: Note first that for a function fdlulP) one lias = -Mj. The rest follows from 

partial integration. 

oo oo oo oo 

/dk f dk oo /" 1 1 /" 

-^{u^)u^du = J —{v)^/vdv=[k{v)^/v]'^-J k{v) -^-j=dv = -- J k{u^)du, 

— oo — oo 

where [k{v) v^cT ~ ^ ^° ^'^^ boundedness and exponential decay of k. 
In the same way one can derive, -^{u^) u'^ du = —^J^^k{u^)u'^du. The result 
follows by noting that since k is an even function only integration over even powers of 
coordinates will be non-zero. □ 



Lemma 42 Let k satisfy Assumption [73 and let Vijki be a given tensor. Assume now 
\\zf > \\z\\^ + Vijkiz' zh^ + P{z) ||zf > \ ||zf on 5(0, rmm) C M"", where l3{z) is conti- 
nuous and P{z) ~ 0(1) as z ^ 0. Then there exists a constant C and a ho > such that 
for all h < ho and all f e C^{B{0,r^in)), 



kh 



\zf + V,,kiz'zh>'z' + P{z)\\zf) 

/l2 



f{z)dz 



Ci/(0) + C2— [(A/)(0) - /(O) J2 Vnkk + Vikik + Vi 



ikki 



i,k 



< Ch^. 



where C is a constant depending on k, rmin, Vij^i and \\j \\(-,3. 
Proof: As a first step we do a Taylor expansion of the kernel around ||z||^ //i^: 



kh 



\z\\ +ri 
~h? 



kh 



j dx 



+ 



d'^khix) 




dx"^ 


||z||2(i_e)+e^ 



where in the last term < 6{z) < 1. We then decompose the integral: 



kh 



zf + Vijkiz'zh''z^ + f3{z) \\z\ 



h^ 



+ 



dkh 
dx 



/l2 

Vjjki z'z^z'^z^ 

/l2 



f{z)dz 



/(0) + (V/|o,z) + 



1 d'f 



2 dz^dz^ 



dz + Y, 



OLi 



i=0 



where we define the five error terms aj as: 

|5 



oto 



OL2 



as 



B(0,rmin) 



dkh 
dx 



P{z)\\z \ 

h^ 



-f{z)dz, 



L 



B{0,rmin) 



d^kh 




dx"^ 


||z||2(l_9) + e^ 



{y,jkiz'ziz^z' + (3{z)\\zf'y 



f{z)dz, 



B{0,rmin) 



A-B(0,r^i„) 



h^ 



6 dz'^dz^dz^ 



/l2 



/(0)+ V/Lz +- 



1 d^f 



2 dz^dz^ 



z'z^ dz 
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a4 



AS{0,r-^i„) 



dkh 

dx 



1 d'^f 

/(o) + (v/Lz> + --'^^ 



2 dz^dzi 



z^z^ \dz, 



where in ai, r/ = Vijkiz^ z^ z^ z^ + (3{z) WzW^ . With A;(||z||^) = 0, Vi, and 
/iRm fc(||z||^) 2j Zjdz = if i ^ J, and Lemma HT] the main term simphfies to: 



/i2 



+ 



dkh{x) 



dx 



/l2 



^y,,fc;nVW)(/(0) + 



2 dz^dzo 



z^z^ ]dz 





2 

2 "'^5(^0^ 



1,2 ^ 



where the 0{h'^) term is finite due to the exponential decay of k and depends on k, rmin) 
Vijki and ||/||(^3. Now we can upper bound the remaining error terms Oj, i = 0, . . . , 4. For 
the argument of the kernel in ai and a2 we have by our assumptions on 5(0, rmin): 

ll^f > \\zf + V^Jklz'zh^z^ + (5{z)\\z\\^ ^ \\zf 



/l2 



/l2 



4/i2 ■ 



Note that this inequality implies that (3 is uniformly bounded on i?(0, rmin) in terms of 
Tmin and Vijki- Moreover, for small enough h we have ^^^^ > ^/~A (see Assumptions [T8] for 
the definition of A) so that we can use the exponential decay of k for as and a4. 



I«0|</i=^||/|lc3 



B(0,-miJi.) 



dkh 
dx 



\l5{hu)\ lluf 



Since is bounded and has exponential decay, one has |ao| < Kq h'^ where Kq depends 
on k, rrain and \\f\\c3. 



lail < 



B(0,rwin) 



d'^kh(\\z\\^{i-e) + ej] 



9x2 



/l2 



(y,,H^'z^'^'=^' + /3(^) Ik 



C3 



S(0,^) 



^(||nr(l-0)+0^) 



/l4 



m m.sy.\Vijki\\\u\\ +h 

i,j,k,l 



-f{z)dz 



lull I du 



First suppose < 2\/ A then the integral is bounded since the integrands are bounded 



on 5(0,^). Now suppose ^ > 2\/ A and decompose 5(0,^) as 5(0,^) 



5(0, 2VA) U 5(0, ^)\5(0,2V^). On 5(0, 2VA) the integral is finite since 



IS 



bounded and on the complement the integral is also finite since \^^\ has exponential 
decay since by assumption 

\\uf (1 - e{hu)) + e{hu)T]{hu) > i \\uf > A. 

Therefore there exists a constant Ki such that |ai| < Ki h^. 

,2x 1 



\0L2\ 

B(0,r„,i„) 



4/i2 / 6 dz^dz^dz^ 



6 



\u\\ 



\u\\^du < K2h^, 



40 



2^ • 1 9V 



2 

2 







<c||/||c3 J e"^(l + m/i2||zf)dz<ce""^(^^y (l + m/i^ 



|«4| < -f^4 



\B(o,n„i„) 

-2: + u 



5^ |U|,2, „.„4 , „.„6 



dx V /l2 

IR'"\B(0,rmin) 

where is a constant depending on maxjj^fc^; \ Vijki\ and ||/||f73. Now one has^'^: e < 

/i'^/^* for /i < ^/s. in particular, it holds > e~" for /i < ^y^rmin, so that for 
h < min{|y^ Tynm, — ^0 all error terms are smaller than a constant times where 

the constant depends on k, rmin, Vijki and ||/||c3- This finishes the proof. □ 

Now we are ready to prove Proposition 1201 

Proof: Let e = ^ min{inj(x), vrp}^^ where e is positive by the assumptions on M. Then 
we decompose M as M = B{x,e) U {M\B[x,e)) and integrate separately. The integral 
over M\B{x, e) can be upper bounded by using the definition of d{x) (see Assumption [T7|) 
and the fact that k is non-increasing: 

kh{\\iix) - iiy)\\'id) fiy)piy)Vdetg dy = j kh{\\iix) - iiy)\\ld) fiy)piy)^/detg dy 

M B{x,e) 

+ I kh{\\i{x) - i{v)\\l^d )f{y)p{y) V det g dy 



M\B{x,e) 

Since k is non-increasing, we have the following inequality for the integral over M\B{x, e): 

r 

/l2 



^ kh{^¥{x) - i{y)\\id'^ f{y)p{y) Vdet g dy < -^k(^-^ 



'M\B{x,e) 

Since 5{x) is positive by assumption and k decays exponentially, we can make the upper 
bound smaller than for small enough h. Now we deal with the integral over B{x,e). 
Since e is smaller than the injectivity radius inj(x), we can introduce normal coordinates 
z = exp~^(y) on B{x, e), so that we can rewrite the integral using Proposition [T3l as: 



r||2_J_V'* O^i" d^i" a buy , n(\\~\\^)\ 

^ — ^"=^ +^(11^11 M p(^)j(,)ydir^rf^ (18) 



Using our assumptions, we see that pf^/detg is in C^{B(0,e)). Moreover, by Corollary 
[TSl one has for dM{x,y) < tt/j, ^dM{x,y) < \\x — y\\ < dM{x,y). Therefore we can apply 



^"This inequality can be deduced from > x" for all x > 4n^. 
^^The factor 1/3 is needed in Theorem 1231 
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Lemma 132] and compute the integral in (jlSp which results in: 

^ab^cd _|_ ^ac^bd _j_ gadgbc 



, s,, sir, ft'^Cj A SH" 8'H" 



0=1 



1^ 



(19) 



where we have used that in normal coordinates at the Laplace-Beltrami operator 



Am is given as Am/ = YT=i 



The second term in the above equation can be 



evaluated using the Gauss equations, see (|Smolyanov et all l2007l . Proposition 6). 



m d 



m d 



+ 2- 



a,b=l a=l 

m d 



=2 5^ 5^ \dz<^dz^ dz^dzb diz'^y d(zb)^ / ' " a(^a)2 9(^fe)2 

m m 

--2 ^ {u{d,a,d,,),u{d,a,d,t))-{u{d,a,d,a),u{d,,,d,t))+3 Y,u{d,a,d,a] 



a,b=l 

-2R + 3 



a=l 



where R is the scalar curvature and we used Lemma |40] in the third equality. Plugging 
this result into (fT9]) and using from Proposition [13l AM-y/det^l^ = —^R, we are done. □ 



Lemma 43 iei A; /ia?;e compact support on [0, and let < h < /imax- T/ien /or 
any X £ M there exist constants Di , D2 > independent of h such that for any y E 
B^4x,hRk)nM, 

< Di < ph{y) < D2. 

Proof: First suppose that hRk < s := min{K/2, i?o/2}- Since \\y — z\\ < HR^ < k/2 we 
have by Lemma [T6l ^duiu, z) < \\y — z\\ < duiu, z). Moreover, since p{x) > on M and 
p is bounded and continuous, there exist lower and upper bounds Pmin and Pmax on the 
density on BM{x,4:hRk)- That implies 

Phiy) < ^ftnax / Vd^dz < \\k\\^p^,,S22"'Rf, 

where the last inequality follows from Lemma [T^ Note further that dM{x,y) < 2hRk and 
dM{y,z) < 2hRk implies dMix,z) < AhRk- Since the kernel function is continuous there 
exists an such that k{x) > \\k\\^ /2 for < x < r^- We get 

Ph{y) > ^ j p{z)./d^gdz > M^p^,^yolMiBMix,hr,)) > 
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Now suppose s < hRk and h < h^ax. Then ph{y) < < \\k\\^ (^^^ . For the lower 
bound we get 



Ph{y)> / khidM (y, z))p{z)y/ dei g dz > / kh{dM {y , z))p{z) y/detg dz 

JM JBM{y,hrk) 

>Mkp(B„(,,ft..))>l^p(B„fe,,^)) 



Since p is continuous and p > 0, the function y F(^BM{y,s is continuous and 
positive and therefore has a lower bound greater zero on the ball B^d{x, hRk) D M. □ 
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